Skip to content

Conversation

@BenjaminBraunDev
Copy link
Contributor

@BenjaminBraunDev BenjaminBraunDev commented Oct 28, 2025

This PR moves SLO aware routing functionality into a multi plugin with both a scorer (scheduling hook) and request tracking plugins (requestcontrol hooks). This removes the need to change endpoint/podmetric/datastore, as we can track everything in the plugin itself.

It also rebases close to the current main, hence the large number of changes.

rlakhtakia and others added 30 commits October 27, 2025 23:13
…-sigs#1549)

Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.16.0 to 0.17.0.
- [Commits](golang/sync@v0.16.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sync
  dependency-version: 0.17.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ubernetes-sigs#1548)

Bumps [sigs.k8s.io/controller-tools](https://github.com/kubernetes-sigs/controller-tools) from 0.18.0 to 0.19.0.
- [Release notes](https://github.com/kubernetes-sigs/controller-tools/releases)
- [Changelog](https://github.com/kubernetes-sigs/controller-tools/blob/main/envtest-releases.yaml)
- [Commits](kubernetes-sigs/controller-tools@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-tools
  dependency-version: 0.19.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* uniquely name CRBAC

Signed-off-by: greg pereira <[email protected]>

* bugfix with testing

Signed-off-by: greg pereira <[email protected]>

---------

Signed-off-by: greg pereira <[email protected]>
…#1518)

* Update type for priority from uint to int in EPP flow control

* Update tests to accomodate for priority changes

* Avoid using Reverse to sort in descending order, update comments
…mpletions (kubernetes-sigs#1446)

* - added more useful fields to types.LLMRequest:
1. cleaner API declaration
2. data fields are preserved, after-read transformations are done in plugins
3. prefix-cache scorer does not need naive templating
- minor bugfixes and improvements

Signed-off-by: Maroon Ayoub <[email protected]>

* removed LLMRequestData::String

Signed-off-by: Maroon Ayoub <[email protected]>

* - rename LLMRequestData to LLMRequestBody
- rename LLMRequest.Data to LLMRequest.Body
- test refactoring after rebase

Signed-off-by: Maroon Ayoub <[email protected]>

---------

Signed-off-by: Maroon Ayoub <[email protected]>
* epp servicemonitor and clusterpodmonitor templates

Signed-off-by: sallyom <[email protected]>

* add monitoring chart doc

Signed-off-by: sallyom <[email protected]>

---------

Signed-off-by: sallyom <[email protected]>
* Replace Gateway API with Inference Extension

This replaces references regarding Gateway API with references
regarding Gateway API Inference Extension (or vice versa, as
appropriate) in site-src/gieps/overview.md

* Replace spec with x-spec in a link

This fixes a link to the reference page for Inference Model.
Replacing InferenceModel with InferenceObjective
* Updating the guides in the doc site

* adding priority and capacity section
* feat(flowcontrol): Refactor FlowRegistry contracts

This commit refactors some of the core Flow Control contracts to improve
clarity and better align with their intended roles. The goal is to
create a more intuitive and robust interface for the upcoming top-level
FlowController.

Key changes include:

- The `FlowRegistryClient` interface is renamed to
  `FlowRegistryDataPlane` to more accurately reflect its role in the
  high-throughput request path.
- The `FlowRegistryAdmin` interface is renamed to `FlowRegistryObserver`
  to clarify its read-only, observational nature.
- The `ActiveFlowConnection.Shards()` method is renamed to
  `ActiveFlowConnection.ActiveShards()` to make it explicit that it
  returns only active, schedulable shards. This removes ambiguity for
  the distributor logic.
- `ShardStats` is enriched with `ID` and `IsActive` fields, providing
  consumers with more context about the shard's state at the time the
  snapshot was taken.
- The registry implementation has been updated to match these new
  contract definitions.

* refactor: Adapt ShardProcessor to a worker role

This commit refactors the `ShardProcessor` to function as a stateful
worker managed by a higher-level supervisor. This is a preparatory step
for the introduction of the new top-level `FlowController`.

The public API of the processor is changed from a direct `Enqueue`
method to a more sophisticated, channel-based submission model with
`Submit` (non-blocking) and `SubmitOrBlock` (blocking). This decouples
the producer from the processor's main loop, enabling better
backpressure signals and higher throughput.

Key changes include:

- Introduction of `Submit` and `SubmitOrBlock` for asynchronous request
  handoff.
- `FlowItem`'s finalization logic is improved to be more robust and
  channel-based.
- Error handling within the dispatch cycle is refactored (no logic
  change) to be more clear about how it promotes work conservation by
  isolating failures to a single priority band.

* feat: Introduce the FlowController supervisor

This commit introduces the `FlowController`, a high-throughput, sharded
supervisor that orchestrates a pool of stateful `ShardProcessor`
workers. This new component is the central processing engine of the Flow
Control system, implementing a "supervisor-worker" pattern.

Key features of the `FlowController` include:

- Supervisor-Worker Architecture: Acts as a stateless supervisor,
  managing the lifecycle of stateful `ShardProcessor` workers. It
  includes a reconciliation loop to garbage-collect workers for stale
  shards.
- Flow-Aware Load Balancing: Implements a "Join-Shortest-Queue-by-Bytes"
  (JSQ-Bytes) algorithm to distribute incoming requests to the
  least-loaded worker, promoting emergent fairness.
- Synchronous API: Exposes a blocking `EnqueueAndWait` method, which
  simplifies client integration (e.g., with Envoy `ext_proc`) and
  provides direct backpressure.
- Lazy Worker Initialization: Workers are created on-demand when a shard
  shard first becomes active to conserve resources and reduce contention
  on the hot path.
- Configuration: A new `Config` object allows for tuning parameters like
  TTLs, buffer sizes, and reconciliation intervals.

* docs: Update comments to align with FlowController

This commit updates documentation and code comments across various
framework components to align with the concepts and architecture
introduced by the `FlowController`.

Key changes include:

- FCFS Policy: Clarified the distinction between "logical" and
  "physical" enqueue time and the behavioral trade-offs when pairing
  with different queue capabilities.
- ListQueue: Expanded the documentation to explain its role as a
  high-performance, approximate FCFS queue in the context of the
  `FlowController`'s retry mechanics.
- Request Types: Refined the comments for `QueueItemAccessor` to be more
  precise about the meaning of `EnqueueTime`.

* refactor Simplify controller Lifecycle

This commit refactors the `FlowController` to simplify its startup and
shutdown lifecycle, making it more robust and easier to reason about.
It also incorporates several smaller improvements based on reviewer
feedback.

The primary change addresses a complex lifecycle implementation that
used an `atomic.Bool` (`isRunning`) and a `ready` channel to manage
state.

Key changes:

- **Simplified Lifecycle:** The controller's lifecycle is now tied
  directly to a `context` passed into `NewFlowController`. The `Run`
  method has been unexported, and the main `run` loop is started as a
  goroutine from the constructor. This eliminates the `ready` channel
  and `isRunning` flag in addition to simplifying the interface for
  callers.
- **Robust Worker Creation:** The `getOrStartWorker` logic has been
  improved to ensure that in a race to create a worker, the "losing"
  goroutine correctly cleans up its resources and does not start a
  redundant processor. This fixes a bug where the losing worker would
  evict all items from its queues on shutdown which were shared
  instances with the winning worker resulting in premature request
  finalization.
- **Comment Reduction:** The extensive explanatory comments in
  `distributeRequest` have been condensed to be more concise while
  retaining the essential details of the algorithm.

- **Minor Cleanups:**
    - The initial, unnecessary call to `reconcileProcessors()` at
      startup has been removed.
    - Error messages have been clarified (e.g., "acquire lease" instead
      of "establish connection").
    - A typed error for nil requests was replaced with a standard
      `errors.New`.
…s#1550)

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.23.0 to 1.23.2.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.23.0...v1.23.2)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-version: 1.23.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
learner0810 and others added 13 commits October 27, 2025 23:14
…s#1568)

* add latency predictor

* add cv in model and update epp deployment

* bug fix

* track mape for predictions

* add running queue size to metrics

* add xgboost regressor and update tpot sampling logic

* emit predicted and actual ttft tpot in body

* seperate servers for training and prediction

* add latency predictor

put the predictor functions in director in a helper function

add scores to reqcxt

record prediction duration metrics

add prefix cache score to model input

slo based routing changes

retreive  request priority queue from the datastore

update scoring logic

* better inital implemenation

Add scheduling profile, working state

remove latencypredictor from director

Move all latency prediction logic out of director and into scheduling profile. Make all Request/Response plugins take in RequestContext

* progress towards fixing up merge conflicts from latency predictor merge

* More refactor progress, fixing and adding tests

* working state, latency prediction

* Clean up changes, remove unneeded files, working functionality without latency flag and scheduling plugins

* Rebase cleanup, remove duplicate lines

* Integrate new alpha-beta slo scoring into scoring plugin

* Fix prefix cache scoring for slo-aware routing

* Add pycache or latency predictor to gitignore

* Rebase with main

* Fix prefix cache scoring being piped to latencyprediction_helper

* add dependancies in scorer

* chage to single profile

* chage to single profile

* restore two profiles

* restore two profiles

* restore two profiles

* update admit request to shed based on predictions

* add TODOs for future changes

* Change artifact registry references to personal compiled images

* Fix existing non-slo aware routing unit tests

* update latency predictor with better eval metrics

* Fix saturation detector unit test

* Change naming of SLO headers and prediction based routing header

* Remove port 9002 service on InferencePool causing make test to fail

* Fix epp hermetic integration test to expect ProcessingMode Send in response header

---------

Co-authored-by: kaushikmitr <[email protected]>
@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Oct 28, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 28, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @BenjaminBraunDev. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 28, 2025
@k8s-ci-robot
Copy link
Contributor

Keywords which can automatically close issues and at(@) or hashtag(#) mentions are not allowed in commit messages.

The list of commits with invalid commit messages:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kfswain
Copy link
Collaborator

kfswain commented Oct 28, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 28, 2025
@kfswain
Copy link
Collaborator

kfswain commented Nov 5, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenjaminBraunDev, kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.